Statistical parsing with non-local dependencies

نویسنده

  • Péter Dienes
چکیده

Non-local dependencies occur when a head (e.g. a verb) and its dependent (e.g. an argument) is separated by some intervening material which does not depend on the same head. Because non-local dependencies play an important role in determining predicate–argument structure of sentences, identifying them correctly is essential for Natural Language Processing. However, this task appears to be rather difficult, and for this reason, non-local dependencies have received only limited amount of attention in the statistical parsing literature. This thesis explores the problems identifying non-local dependencies poses for statistical parsing technology. We argue that the difficulties are due to the enlarged search space, an effect of the large amount of locally unresolvable ambiguities introduced by non-local dependencies. We show that the search space can be efficiently reduced by taking lexical and local information into account. We present several simple parsing models incorporating this knowledge. We claim that non-local dependencies in English can be efficiently and accurately recovered by an appropriate combination of shallow approaches. In particular, we show that a finite-state machine without explicit knowledge of phrase structure information is able to detect heads participating in non-local constructions with state-of-the-art accuracy. This machine is employed to constrain the search space of a phrase-structure parser. The parser, when coupled with the finite-state preprocessor, is fast and achieves the best reported results on recovering non-local dependencies. The accuracy of the system crucially depends on the way the parser and the preprocessor are combined. We develop a novel probabilistic framework where a preprocessor guides a parser through imposing soft constraints on the search space. The parser takes not only the hypotheses of the finite-state machine into account, but also incorporates the preprocessor’s probability estimate for these hypotheses, and thus improves its own estimate for the given structure. This combination method attains our goals: the system is simple and, at the same time, efficient and accurate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PLCFRS Parsing of English Discontinuous Constituents

This paper proposes a direct parsing of non-local dependencies in English. To this end, we use probabilistic linear context-free rewriting systems for data-driven parsing, following recent work on parsing German. In order to do so, we first perform a transformation of the Penn Treebank annotation of non-local dependencies into an annotation using crossing branches. The resulting treebank can be...

متن کامل

Data and Models for Statistical Parsing with Combinatory Categorial Grammar

This dissertation is concerned with the creation of training data and the development of probability models for statistical parsing of English with Combinatory Categorial Grammar (CCG). Parsing, or syntactic analysis, is a prerequisite for semantic interpretation, and forms therefore an integral part of any system which requires natural language understanding. Since almost all naturally occurri...

متن کامل

Online Graph Planarisation for Synchronous Parsing of Semantic and Syntactic Dependencies

This paper investigates a generative history-based parsing model that synchronises the derivation of non-planar graphs representing semantic dependencies with the derivation of dependency trees representing syntactic structures. To process non-planarity online, the semantic transition-based parser uses a new technique to dynamically reorder nodes during the derivation. While the synchronised de...

متن کامل

Exploiting Non-Local Features for Spoken Language Understanding

In this paper, we exploit non-local features as an estimate of long-distance dependencies to improve performance on the statistical spoken language understanding (SLU) problem. The statistical natural language parsers trained on text perform unreliably to encode non-local information on spoken language. An alternative method we propose is to use trigger pairs that are automatically extracted by...

متن کامل

Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar

Why use tree adjoining grammars (TAG) for statistical parsing? It might be thought that its added formal power makes parameter estimation unnecessarily difficult; or that whatever benefits it provides—the ability to model unbounded cross-serial dependencies, for example— are inconsequential for statistical parsing, which is concerned with the probable rather than the possible. But just as TAG i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005